Why Supertagging Is Hard
نویسنده
چکیده
Tree adjoining grammar parsers can use a supertagger as a preprocessor to help disambiguate the category1 of words and thus speed up the parsing phase dramatically. However, since the errors in supertagging propagate to this phase, it is vital to keep the error rate of the supertagger phase reasonably low. With very large tagsets coming from extracted grammars, this error rate can be of almost 20%, using standard Hidden Markov Model techniques. To combat this problem, we can trade a higher precision for increased ambiguity in the supertagger output. I propose a new approach to introduce ambiguity in the supertags, looking for a suitable trade-off. The method is based on a representation of the supertags as a feature structure and consists in grouping the values, or a subset of the values, of certain features, generally those hardest to predict.
منابع مشابه
Stacking or Supertagging for Dependency Parsing - What's the Difference?
Supertagging was recently proposed to provide syntactic features for statistical dependency parsing, contrary to its traditional use as a disambiguation step. We conduct a broad range of controlled experiments to compare this specific application of supertagging with another method for providing syntactic features, namely stacking. We find that in this context supertagging is a form of stacking...
متن کاملHPSG Supertagging: A Sequence Labeling View
Supertagging is a widely used speed-up technique for deep parsing. In another aspect, supertagging has been exploited in other NLP tasks than parsing for utilizing the rich syntactic information given by the supertags. However, the performance of supertagger is still a bottleneck for such applications. In this paper, we investigated the relationship between supertagging and parsing, not just to...
متن کاملCCG Supertagging with a Recurrent Neural Network
Recent work on supertagging using a feedforward neural network achieved significant improvements for CCG supertagging and parsing (Lewis and Steedman, 2014). However, their architecture is limited to considering local contexts and does not naturally model sequences of arbitrary length. In this paper, we show how directly capturing sequence information using a recurrent neural network leads to f...
متن کاملTOWARDS EFFICIENT STATISTICAL PARSING USING LEXICALIZED GRAMMATICAL INFORMATION by
For a long time, the goal of wide-coverage natural language parsers had remained elusive. Much progress has been made recently, however, with the development of lexicalized statistical models of natural language parsing. Although lexicalized tree adjoining grammar (TAG) is a lexicalized grammatical formalism whose development predates these recent advances, its application in lexicalized statis...
متن کاملA Simple Approach for HPSG Supertagging Using Dependency Information
In a supertagging task, sequence labeling models are commonly used. But their limited ability to model long-distance information presents a bottleneck to make further improvements. In this paper, we modeled this long-distance information in dependency formalism and integrated it into the process of HPSG supertagging. The experiments showed that the dependency information is very informative for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004